58 PART 2 Examining Tools and Processes

There are both advantages and disadvantages to using these online commercial

platforms. Advantages include that online software tends to follow a cheaper sub-

scription paid monthly or annually, and you get continuous upgrades because the

software is web based. The main downside is these platforms have a high learning

curve and require a lot of work to fully adopt, so you have to ask yourself if it

makes sense with your project.

Focusing on Open-Source

and Free Software

Open-source software refers to software that has been developed and supported by

a user community. Although open-source software has licenses, they are typically

free but require you to adhere to certain policies when using the software. In this

section, we talk about the two most popular open-source statistical software

packages: R and Python.

Open-source software

The two most popular and extensive open-source statistical programs are R and

Python.»

» R: R is statistical software that has been developed and is maintained by the R

user community. It has two interfaces: R GUI, which looks similar to PC SAS

and SPSS, and RStudio, which is an integrated development environment

(IDE). Analysts prefer to use RStudio when developing graphical displays for

the web, while R GUI is fine for most statistical work. To run R, you download

and install the base application. Then, for specified functions not included in

the base application, you install additional R packages. Like with PC SAS, in R,

you import or connect to datasets, develop and save code files to run on

those datasets, and produce output you can save. Base R, R packages, and

documentation are available on the Comprehensive R Archive Network

(CRAN) server at https://cran.r-project.org

» Python: Python is an open-source programming language that is often used

to analyze data. As with R, Python is developed and maintained by its own

user community and runs in a similar way. Although you still develop code

that runs against datasets in the Python environment, the Python and R code

are different. Instead of packages as in R, Python has libraries. Python is

available at www.python.org/downloads.